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ABSTRACT 



A multiprocessing computer system employing a three -hop 
communications protocol. When a request is sent by a 
requesting node to a home node, the home node sends read 
and/or invalidate demands to any slave nodes holding 
cached copies of the requested data. The demands from the 
home node to the slave nodes may each advantageously 
include a value indicative of the number of replies the 
requesting agent should expect to receive. The slaves reply 
back to the requesting node with either data or an acknowl- 
edge. Each reply may further include the number of replies 
the requester should expect. Upon receiving all expected 
rephes, the requesting node may send a completion message 
back to the home and may treat the transaction as completed 
and proceed with subsequent processing. 

24 Claims, 16 Drawing Sheets 
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MULTIPROCESSING SYSTEM EMPLOYING 
A THREE-HOP COMMUNICATION 
PROTOCOL 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

This invention relates to the field of multiprocessor com- 
puter systems and, more particularly, to communication 
protocols employed within multiprocessor computer sys- 
tems having distributed shared memory architectures. 

2. Description of the Relevant Art 
Multiprocessing computer systems include two or more 

processors which may be employed to perform computing 
tasks. A particular computing task may be performed upon 
one processor while other processors perform unrelated 
computing tasks. Alternatively, components of a particular 
computing task may be distributed among multiple proces- 
sors to decrease the time required to perform the computing 
task as a whole. Generally speaking, a processor is a device 
configured to perform an operation upon one or more 
operands to produce a result. TTje operation is performed in 
response to an instruction executed by the processor. 

A popular architecture in commercial multiprocessing 
computer systems is the symmetric multiprocessor (SMP) 
architecture. Typically, an SMP computer system comprises 
multiple processors connected through a cache hierarchy to 
a shared bus. Additionally connected to the bus is a memory, 
which is shared among the processors in the system. Access 
to any particular memory location within the memory occurs 
in a similar amount of time as access to any other particular 
memory location. Since each location in the memory may be 
accessed in a uniform manner, this structure is often referred 
to as a uniform memory architecture (UMA). 

Processors are often configured with internal caches, and 
one or more caches are typically included in the cache 
hierarchy between the processors and the shared bus in an 
SMP computer system. Multiple copies of data residing at a 
particular main memory address may be stared in these 
caches. In order to maintain the shared memory model, in 
which a particular address stores exactly one data value at 
any given time, shared bus computer systems employ cache 
coherency. Generally speaking, an operation is coherent if 
the effects of the operation upon data stored at a particular 
memory address are reflected in each copy of the data within 
the cache hierarchy. For example, when data stored at a 
particular memory address is updated, the update may be 
supplied to the caches which are storing copies of the 
previous data. Alternatively, the copies of the previous data 
may be invalidated in the caches such that a subsequent 
access to the particular memory address causes the updated 
copy to be transferred from main memory. For shared bus 
systems, a snoop bus protocol is typically employed. Each 
coherent transaction performed upon the shared bus is 
examined (or "snooped") against data in the caches. If a 
copy of the affected data is found, the state of the cache line 
containing the data may be updated in response to the 
coherent transaction. 

Unfortunately, shared bus architectures suffer from sev- 
eral drawbacks which limit their usefulness in multiprocess- 
ing computer systems. A bus is capable of a peak bandwidth " 
(e.g. a number of bytes/second which may be transferred 
across the bus). As additional processors are attached to the 
bus, the bandwidth required to supply the processors with 
data and instructions may exceed the peak bus bandwidth. 
Since some processors are forced to wait for available bus 
bandwidth, performance of the computer system suffers 
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when the bandwidth requirements of the processors exceeds 
available bus bandwidth. 

Additionally, adding more processors to a shared bus 
increases the capacitive loading on the bus and may even 

S cause the physical length of the bus to be increased. The 
increased capacitive loading and extended bus length 
increases the delay in propagating a signal across the bus. 
Due to the increased propagation delay, transactions may 
Uke longer to perform. Therefore, the peak bandwidth of the 
bus may decrease as more processors are added. 

These problems are further magnified by the continued 
increase in operating frequency and performance of proces- 
sors. The increased performance enabled by the higher 
frequencies and more advanced processor microarchitec- 
tures results in higher bandwidth requirements than previous 
processor generations, even for the same number of proces- 
sors. Therefore, buses which previously provided sufiBcient 
bandwidth for a multiprocessing computer system may be 
insufficient for a similar computer system employing the 
higher performance processors. 

Another structure for multiprocessing computer systems 
is a distributed shared memory architecture. A distributed 
shared memory architecture includes multiple nodes within 
which processors and memory reside. The multiple nodes 
communicate via a network coupled there between. When 

25 considered as a whole, the memory included within the 
multiple nodes forms the shared memory for the computer 
system. Topically, directories are used to identify which 
nodes have cached copies of data corresponding to a par- 
ticular address. Coherency activities may be generated via 

30 examination of the directories. 

Distributed shared memory systems are scale able, over- 
coming the limitations of the shared bus architecture. Since 
many of the processor accesses are completed within a node, 
nodes typically have much lower bandwidth requirements 

35 upon the network than a shared bus architecture must 
provide upon its shared bus. The nodes may operate at high 
clock firequehcy and bandwidth, accessing the network when 
needed. Additional nodes may be added to the network 
without affecting the local bandwidth of the nodes. Instead, 

40 only the network bandwidth is affected. 

The coherence between nodes in a distributed shared 
memory system is often kept using a distributed implemen- 
tation of coherence protocols. Many such coherence proto- 
cols employ four-hop replies wherein a request is first sent 

45 to a home node from a requesting node. The home node 
responsively sends read/invalidate demands to slave nodes 
holding cached copies of the data. The slaves reply back to 
the home node according to the demands. The four-hop reply 
protocol is completed when the home node repUes back to 

50 the requesting node. 

Unfortunately, the communication patterns generated 
when data must be accessed from a remote node causes a 
significant amount of network traffic. In addition, after all 
slave nodes have replied to the home node, the requesting 

55 node must wait until the home node sends a completion 
indication back to the requesting node before the requesting 
node can treat the transaction as completed. This may add to 
the overall latency of the critical path associated with the 
coherency transaction. 

^0 A multiprocessor computer system having a distributed 
shared memory system is thus desirable wherein network 
traffic is reduced and wherein the latency in replying to a 
requesting node is reduced. 

g5 SUMMARY OF THE INVENTION 

The problems outlined above are in large part solved by 
a multiprocessor computer system employing a communi- 
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cations protocol in accordance with the present iovention. In FIG. 1 is a block diagrani of a multiprocessor computer 

one embodiment, when a request is sent by a requesting system. 

node to a home node, the home node sends read and/or piG. lA is a conceptualized block diagram depicting a 

mvahdate demands to any ^ve nodes holdmg cached oon-uniform memory architecture supported by one 
copies of the requested data. The demands from the home 5 embodiment of the computer system shown in FIG. 1. 

node to the slave nodes may each advantageously mclude a ^„ . , , . . 

value indicative of the number of replies the requesting ^ ^ conceptualized block diagram depicting a 

agent should expect to receive. The slaves reply back to the cache-only memory architecture supported by one embodi- 

requesting node with either data or an acknowledge. Each ^^^^ computer system shown in FIG. 1. 

reply may further include the number of replies the requester FIG. 2 is a block diagram of one embodiment of a 

should expect. Upon receiving all expected replies, the symmetric multiprocessing node depicted in FIG. 1. 

requesting node may treat the transaction as completed and piG. 2A is an exemplary directory entry stored in one 

proceed with subsequent processing. In this manner, all embodiment of a directory depicted in FIG 2 

communications f^ay require at most a three-hop commu- FIG. 3 is a block diagram of one embodiment of a system 

nication on the cntical path of the cache coherence protocol. .^^^^^^^ ^^^^^ ^ 
Accordingly, the overall network traffic as a result of the 

cache coherence protocol may be advantageously reduced. ^ ^ ^ diagram depicting activities performed in 

Furthermore, the latency of the critical path for a requesting response to a typical coherency operation between a request 

node to complete a transaction may be reduced, agent, a home agent, and a slave agent. 

In one implementation, after the requesting node has FIG. 5A is a diagram of an exemplary coherency opera- 
received all expected replies, the requesting node may send 20 tion performed in response to a read to own request from a 

a completion message back to the home. The home node processor. 

may then remove a "block" placed upon the coherency unit FIG. 5B is a diagram depicting coherency activity in 

of the completed transaction. response to a read to own request when a slave agent is the 

The requesting node may further or alternatively send current owner of the coherency unit and other slave agents 
data back to the home node to achieve memory reflection 25 have shared copies of the coherency unit, 

after receivmg data from a slave node. Furtherrnore, in cases pjQ 5^ is a diagram that depicts coherency activity when 

where the home node contains the requested data m an ^ ^ ^ ^^^^ ^ 

appropriate state, e.g., state shared for a read-to-own request, request to a home a ent 

the home node does not send any demands to other nodes. __ . . ... 

Instead, the home node replies directly to the requesting ^ diagram depicting coherency activity in 

jjo^e response to a read to share request when a slave is the owner 

A system and method in accordance with the present ^ coherency unit, 

invention may advantageously allow for an efficient and ^^9* ^ ^ ^ flowchart depicting an exemplary state 

simple implementation of a global coherency protocol in a machine for one embodiment of a request agent shown in 

multiprocessing computer system. ITie protocol allows for FIG. 3. 

an owner-based protocol wherein several dirty cached cop- FIG. 7 is a flowchart depicting an exemplary state 

ics may reside in differing nodes with one of them in the machine for one embodiment of a home agent shown in FIG, 

owner state and a copy in the home node which is stale. 3. 

Broadly speaking, the present invention contemplates a piG. 8 is a flowchart depicting an exemplary state 

multiprocessing computer system including a plurahty of machine for one embodiment of a slave agent shown in FIG. 

processing nodes interconnected by a network. The multi- 3 

processing computer system comprises a request agent con- «• * ui i- j- * 

figured t^ generate a' coherency request,\ home agent ^ ^/J^^^^ ^^''''^ ^^^^ ^° 

coupled to receive the coherency request through the net- ^mbodmient of the system mterface. 

work and to generate a coherency demand in response to the ^ ^ ^^^^ listing demand types according to one 

coherency request, and a slave agent coupled to receive the 45 embodiment of the system mterface. 

coherency demand through the network and to generate a FIG. 11 is a table listing reply types according to one 

coherency reply in response to the coherency demand. The embodiment of the system interface. 

request agent is further configured to receive the coherency FIG. 12 is a table listing completion types according to 

reply through the network. one embodiment of the system interface. 

The invention further contemplates a method for main- FIG, 13 is a table describing coherency operations in 

taining coherency in a multiprocessing computer system response to various operations performed by a processor, 

including a plurality of processing nodes interconnected by according to one embodiment of the system interface, 

a network. The method comprises a request agent generating While the invention is susceptible to various modifica- 

a coherency request, a home agent receiving the coherency tions and alternative forms, specific embodiments thereof 

request through the network, and the home agent generating are shown by way of example in the drawings and will 

a coherency demand in response to the coherency request. herein be described in detail. It should be understood, 

The method further comprises a slave agent receiving the however, that the drawings and detailed description thereto 

coherency demand through the network, the slave agent are not intended to limit the invention to the particular form 

generating a coherency reply in response to the coherency disclosed, but on the conU*ary, the intention is to cover all 
demand, and the request agent receiving the coherency reply ^0 modifications, equivalents and alternatives falling within the 

through the network. spirit and scope of the present invention as defined by the 

BRIEF DESCRIPTION OF THE DRAWINGS appended claims. 

Other objects and advantages of the invention will DETAILED DESCRIPTION OF THE 

become apparent upon reading the following detailed 65 INVENTION 

description and upon reference to the accompanying draw- Turning now to FIG. 1, a block diagram of one embodi- 

ings in which: ment of a multiprocessing computer system 10 is shown. 
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Computer system 10 includes multiple SMP nodes Each SMP node 12 is essentially an SMP system having 

12A-12D interconnected by a point-to-point network 14. memory 22 as the shared memory. Processors 16 are high 

Elements referred to herein with a particular reference performance processors. In one embodiment, each processor 

number followed by a letter will be collectively referred to 16 is a SPARC processor compliant with version 9 of the 

by the reference number alone. For example, SMP nodes 5 SPARC processor architecture. It is noted, however, that any 

12A-12D will be collectively referred lo as SMP nodes 12. . processor architecture may be employed by processors 16. 

In the embodiment shown, each SMP node 12 includes Typically, processors 16 include internal instruction and 

multiple processors, external caches, an SMP bus, a data caches. Therefore, external caches 18 are labeled as L2 

memory, and a system interface. For example, SMP node caches (for level 2, wherein the internal caches are level 1 

12A is configured with multiple processors including pro- caches). If processore 16 are not configured with internal 

cessors 16A-16B. The processors 16 are connected to caches, then external caches 18 are level 1 caches. It is noted 

external caches 18, which are further coupled to an SMP bus that the "level" nomenclature is used to identify proximity of 

20. AddiUonally a memory 22 and a system mterface 24 are ^ particular cache to the processing core within processor 16. 

coupled to SMP bus 2a StiU further, one or more mput/ Uvel 1 is nearest the processing core, level 2 is next nearest, 

ompu (I/O) mterfaces 26 may be coupled to SMP bus 20, ^^.^^^^j ^^^J^^ ^^^.^^ ^^^.^ ^ ' 

1/0 interfaces 26 are used to mterface to penpheral devices j u 1. , i 

such as serial and parallel ports, disk drives, modems, f,^^^^,^ ^Ta^ accessed by the processor 16 coupled 

printers, etc. Other SMP nodes 12B-12D may be configured ^^^^ external caches 18 may be configured 

similarly ^ °' * vanety of specific cache arrangements. For 

Generally speaking, for any given transaction a particular ^^^^V^^^ set-associative or direct-mapped configurations 
SMP node 12 may serve as a requesting node, a home node, 20 t>e employed by external caches 18. 
or a slave node. When a request is sent by a requesting node SMP bus 20 accommodates communication between pro- 
to a home node, the home node sends read and/or invalidate cessors 16 (through caches 18), memory 22. system inter- 
requests to any slave nodes holding cached copies of the face 24, and I/O interface 26. In one embodiment, SMP bus 
requested data. The demands from the home node to the 20 includes an address bus and related control signals, as 
slave nodes advantageously includes a value indicative of 25 well as a data bus and related control signals. Because the 
the number of replies the requesting agent should expect to address and data buses are separate, a split-transaction bus 
receive. The slaves reply back to the requesting node with protocol may be employed upon SMP bus 20. Generally 
either data or an acknowledge. Each reply may further speaking, a split-transaction bus protocol is a protocol in 
include the number of replies the requester should expect. which a transaction occurring upon the address bus may 
Upon receiving all expected replies, the requesting node 30 differ from a concurrent transaction occurring upon the data 
may treat the transaction as completed and proceed with bus. Transactions involving address and data include an 
subsequent processing. In this manner, all communications address phase in which the address and related control 
may require at most a three-hop communication on the information is conveyed upon the address bus, and a data 
critical path of the cache coherence protocol. Accordingly, phase in which the data is conveyed upon the data bus. 
the overall network traffic as a result of the cache coherence 35 Additional address phases and/or data phases for other 
protocol may be advantageously reduced. Furthermore, the transactions may be initiated prior to the data phase corre- 
latency of the critical path for a requesting node to complete sponding to a particular address phase. An address phase and 
a transaction may be reduced. the corresponding data phase may be correlated in a number 

In one implementation, after the requesting node has of ways. For example, data transactions may occur in the 

received all expected replies, the requesting node may send 40 same order that the address transactions occur. Alternatively, 

a completion message back to the home. The home node address and data phases of a transaction may be identified 

may remove a "block" placed upon the coherency unit of the via a unique tag. 

completed transaction. Memory 22 is configured to store data and instruction 

The requesting node may further or alternatively send code for use by processors 16. Memory 22 preferably 

data back to the home node to achieve memory reflection 4s comprises dynamic random access memory (DRAM), 

after receiving data from a slave node. Furthermore, in cases although any type of memory may be used. Memory 22, in 

where the home node contains the requested data in an conjunction with similar illustrated memories in the other 

appropriate slate, e.g., state shared for a read-to-own request, SMP nodes 12, forms a distributed shared memory system, 

the home node does not send any demands to other nodes. Each address in the address space of the distributed shared 

Instead, the home node replies directly to the requesting 50 memory is assigned to a particular node, referred to as the 

node. Further details regarding the communication protocol home node of the address. A processor within a different 

associated with system 10 are provided further below. node than the home node may access the data at an address 

As used herein, a memory operation is an operation of the home node, potentially caching the data. Therefore, 

causing transfer of data from a source to a destination. The coherency is maintained between SMP nodes 12 as well as 

source and/or destination may be storage locations within 55 among processors 16 and caches 18 within a particular SMP 

the initiator, or may be storage locations within memory. node 12A-12D. System interface 24 provides interaode 

When a source or destination is a storage location within coherency, while snooping upon SMP bus 20 provides 

memory, the source or destination is specified via an address intranode coherency. 

conveyed with the memory operation. Memory operations In addition to maintaining internode coherency, system 

may be read or write operations. A read operation causes 60 interface 24 detects addresses upon SMP bus 20 which 

transfer of data from a source outside of the initiator to a require a data transfer to or from another SMP node 12. 

destination within the initiator. Conversely, a write operation System interface 24 performs the transfer, and provides the 

causes transfer of data from a source within the initiator to corresponding data for the transaction upon SMP bus 20. In 

a destination outside of the initiator. In the computer system the embodiment shown, system interface 24 is coupled to a 

shown in FIG. 1, a memory operation may include one or 65 point-to-point network 14. However, it is noted that in 

more transactions upon SMP bus 20 as well as one or more alternative embodiments other networks may be used. In a 

coherency operations upon network 14. point-to-point network, individual connections exist 
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between each node upon the network. A particular node ''attract" the data being operated upon by a corresponding 

communicates directly with a second node via a dedicated processor 42. As opposed to the NUMA architecture shown 

link. To communicate with a third node, the particular node in FIG. lA, architecture 40 reduces the number of accesses 

utilizes a different link than the one used to communicate upon the network 48 by storing remote data in the local 
with the second node. 5 memory when the local processor accesses that data. 

It is noted that, although four SMP nodes 12 are shown in Architecture 40 is referred to as a cache-only memory 

FIG. 1. embodiments of computer system 10 employing any architecture (COMA). Multiple locations within the distrib- 

number of nodes are contemplated. uted shared memory formed by the combination of memo- 

FIGS. lA and IB are conceptualized illustrations of ries 46 may store data corresponding to a particular address, 

distributed memory architectures supported by one erabodi- No permanent mapping of a particular address to a particular 

mentof computer system 10. Specifically, FIGS. lA and IB storage location is assigned. Instead, the location storing 

illustrate alternative ways in which each SMP node 12 of data corresponding to the particular address changes 

FIG. 1 may cache data and perform memory accesses. dynamically based upon the processors 42 which access that 

Details regarding the manner in which computer system 10 particular address. Conversely, in the NUMA architecture a 

supports such accesses will be described in further detail particular storage location within memories 46 is assigned to 

below. a particular address. Architecture 40 adjusts to the memory 

IVming now to FIG. lA, a logical diagram depicting a access patterns performed by applications executing 

first memory architecture 30 supported by one embodiment thereon, and coherency is maintained between the memories 
of computer system 10 is shown. Architecture 30 includes 

multiple processors 32A-32D, multiple caches 34A-34D, ^° In a preferred embodiment, computer system 10 supports 

multiple memories 36A-36D, and an interconnect network both of the memory architectures shown in FIGS. lA and 

38. The multiple memories 36 form a distributed shared IB. In particular, a memory address may be accessed in a 

memory. Each address within the address space corresponds NUMA fashion from one SMP node 12A-12D while being 

to a location within one of memories 36. accessed in a COMA manner from another SMP node 

Architecture 30 is a non-uniform memory architecture 12A-12D. In one embodiment, a NUMA access is detected 

(NUMA). In a NUMA architecture, the amount of time if certain bits of the address upon SMP bus 20 identify 

required to access a first memory address may be substan- another SMP node 12 as the home node of the address 

tially different than the amount of time required to access a presented. Otherwise, a COMA access is presumed. Addi- 

second memory address. The access time depends upon the tional details will be provided below, 

origin of the access and the location of the memory In one embodiment, the COMA architecture is imple- 

36A-36D which stores the accessed data. For example, if menled using a combination of hardware and software 

processor 32A accesses a first memory address stored in techniques. Hardware maintains coherency between the 

memory 36 A, the access time may be significantly shorter locally cached copies of pages, and software (e.g. the 

than the access time for an access \o a second memory operating system employed in computer system 10) is 

address stored in one of memories 36B-36D. That is, an responsible for allocating and allocating cached pages, 

access by processor 32A to memory 36A may be completed FIG. 2 depicts details of one implementation of an SMP 

locally (e.g. without transfers upon network 38), while a node 12A that generally conforms to the SMP node UA 

processor 32A access to memory 36B is performed via shown in FIG. 1. Other nodes 12 may be configured simi- 

network 38. Typically, an access through network 38 is larly. It is noted that alternative specific implementations of 

slower than an access completed within a local memory. For each SMP node 12 of FIG. 1 are also possible. The imple- 

example, a local access might be completed in a few mentation of SMP node 12A shown in FIG. 2 includes 

hundred nanoseconds while an access via the network might multiple subnodes such as subnodes 50A and SOB. Each 

occupy a few microseconds. subnode 50 includes two processors 16 and corresponding 

Data corresponding to addresses stored in remote nodes 45 caches 18, a memory portion 56, an address controller 52, 

may be cached in any of the caches 34. However, once a and a data controller 54. The memory portions 56 within 

cache 34 discards the data corresponding to such a remote subnodes 50 collectively form the memory 22 of the SMP 

address, a subsequent access to the remote address is com- node 12Aof FIG. 1. Other subnodes (not shown) are further 

pleted via a transfer upon network 38. coupled to SMP bus 20 to form the I/O interfaces 26. 

NUMA architectures may provide excellent performance 50 As shown in FIG. 2, SMP bus 20 includes an address bus 

characteristics for software applications which use addresses 58 and a data bus 60. Address controller 52 is coupled to 

that correspond primarily to a particular local memory. address bus 58, and data controller 54 is coupled to data bus 

Software applications which exhibit more random access 60. FIG. 2 also illustrates system interface 24, including a 

patterns and which do not confine their memory accesses to system interface logic block 62, a translation storage 64, a 
addresses within a particular local memory, on the other 55 directory 66, and a memory tag (MTAG) 68. Logic block 62 

hand, may experience a large amount of network traffic as a is coupled to both address bus 58 and data bus 60, and 

particular processor 32 performs repeated accesses to remote asserts an ignore signal 70 upon address bus 58 under certain 

nodes. circumstances as will be explained further below. 

Turning now to FIG. IB, a logic diagram depicting a Additionally, logic block 62 is coupled to translation storage 

second memory architecture 40 supported by the computer 60 directory 66, MTAG 68, and network 14. 

system 10 of FIG. 1 is shown. Architecture 40 includes For the embodiment of FIG. 2, each subnode 50 is 

multiple processors 42A-42D, multiple caches 44A-44D, configured upon a printed circuit board which may be 

multiple memories 46A-46D, and network 48. However, inserted into a backplane upon which SMP bus 20 is 

memories 46 are logically coupled between caches 44 and situated. In this manner, the number of processors and/or I/O 
network 48. Memories 46 serve as larger caches (e.g. a level 65 interfaces 26 included within an SMP node 12 may be varied 

3 cache), storing addresses which are accessed by the by inserting or removing subnodes 50. For example, com- 

corresponding processors 42. Memories 46 are said to puter system 10 may initially be configured with a small 
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number of suboodes 50. Additional subnodes 50 may be 
added from time to time as the computing power required by 
the users of computer system 10 grows. 

Address controller 52 provides an interface between 
caches 18 and the address portion of SMP bus 20. In the 
embodiment shown, address controller 52 includes an out 
queue 72 and some number of in queues 74. Out queue 72 
buffers transactions from the processors connected thereto 
until address controller 52 is granted access to address bus 
58. Address controller 52 performs the transactions stored in 
out queue 72 in the order those transactions were placed into 
out queue 72 (i.e. out queue 72 is a FIFO queue). Transac- 
tions performed by address controller 52 as well as trans- 
actions received &om address bus 58 which are to be 
snooped by caches 18 and caches internal to processors 16 
are placed into in queue 74. 

Similar to out queue 72, in queue 74 is a FIFO queue. All 
address transactions are stored in the in queue 74 of each 
subnode 50 (even within the in queue 74 of the subnode 50 
which initiates the address transaction). Address transactions 
are thus presented to caches 18 and processors 16 for 
snooping in the order they occur upon address bus 58. The 
order that transactions occur upon address bus 58 is the order 
for SMP node 12A. However, the complete system is 
expected to have one global memory order. This ordering 
expectation creates a problem in both the NUMA and 
COMA architectures employed by computer system 10, 
since the global order may need to be established by the 
order of operations upon network 14. If two nodes perform 
a transaction to an address, the order that the corresponding 
coherency operations occur at the home node for the address 
defines the order of the two transactions as seen within each 
node. For example, if two write transactions are performed 
to the same address, then the second write operation to arrive 
at the address' home node should be the second write 
transaction to complete (i.e. a byte location which is updated 
by both write transactions stores a value provided by the 
second write transaction upon completion of both 
transactions). However, the node which performs the second 
transaction may actually have the second transaction occur 
first upon SMP bus 20. Ignore signal 70 allows the second 
transaction to be transferred to system interface 24 without 
the remainder of the SMP node 12 reacting to the transac- 
tion. 

Therefore, in order to operate effectively with the ordering 
constraints imposed by the out queue/in queue structure of 
address controller 52, system interface logic block 62 
employs ignore signal 70. When a transaction is presented 
upon address bus 58 and system interface logic block 62 
detects that a remote transaction is to be performed in 
response to the transaction, logic block 62 asserts the ignore 
signal 70. Assertion of the ignore signal 70 \yith respect to 
a transaction causes address controller 52 to inhibit storage 
of the transaction into in queues 74. Therefore, other trans- 
actions which may occur subsequent to the ignored trans- 
action and which complete locally within SMP node 12A 
may complete out of order with respect to the ignored 
transaction without violating the ordering mles of in queue 
74. In particular, transactions performed by system interface 
24 in response to coherency activity upon network 14 may 
be performed and completed subsequent to the ignored 
transaction. When a response is received from the remote 
transaction, the ignored transaction may be reissued by 
system interface logic block 62 upon address bus 58. The 
transaction is thereby placed into in queue 74, and may 
complete in order with transactions occurring at the time of 
reissue. 
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It is noted that in one embodiment, once a transaction 
from a particular address controller 52 has been ignored, 
subsequent coherent transactions from that particular 
address controller 52 are also ignored. Transactions from a 

5 particular processor 16 may have an important ordering 
relationship with respect to each other, independent of the 
ordering requirements imposed by presentation upon 
address bus 58. For example, a transaction may be separated 
from another transaction by a memory synchronizing 

10 instruction such as the MEMBAR instruction included in the 
SPARC architecture. The processor 16 conveys the transac- 
tions in the order the transactions are to be performed with 
respect to each other. The transactions are ordered within out 
queue 72, and therefore the transactions originating from a 

15 particular out queue 72 are to be performed in order. 

Ignoring subsequent transactions from a particular 
address controller 52 allows the in-order rules for a particu- 
lar out queue 72 to be preserved. It is further noted that not 
all transactions from a particular processor must be ordered. 

20 However, it is difficult to determine upon address bus 58 
which transactions must be ordered and which transactions 
may not be ordered. Therefore, in this implementation, logic 
block 62 maintains the order of all transactions from a 
particular out queue 72. It is noted that other implementa- 

25 tions of subnode 50 are possible that allow exceptions to this 
mle. 

Data controller 54 routes data to and from data bus 60, 
memory portion 56 and caches 18. Data controller 54 may 
include in and out queues similar to address controller 52. In 
one embodiment, data controller 54 employs multiple physi- 
cal units in a byte-sliced bus configuration. 

Processors 16 as shown in FIG. 2 include memory man- 
agement units (MMUs) 76A-76B. MMUs 76 perform a 
2j virtual to physical address translation upon the data 
addresses generated by the instruction code executed upon 
processors 16, as well as the instruction addresses. The 
addresses generated in response to instruction execution are 
virtual addresses. In other words, the virtual addresses are 
the addresses created by the programmer of the instruction 
code. The virtual addresses are passed through an address 
translation mechanism (embodied in MMUs 76), from 
which corresponding physical addresses are created. The 
physical address identifies a storage location within memory 
22. 

Address translation is performed for many reasons. For 
example, the address translation mechanism may be used to 
grant or deny a particular computing task's access to certain 
memory addresses. In this manner, the data and instructions 

50 within one computing task are isolated from the data and 
instructions of another computing task. Additionally, por- 
tions of the data and instructions of a computing task may be 
"paged out" to a hard disk drive. When a portion is paged 
out, the translation is invalidated. Upon access to the portion 

55 by the computing task, an interrupt occurs due to the failed 
translation. The interrupt allows the operating system to 
retrieve the corresponding information from the tiard disk 
drive. In this manner, more virtual memory may be available 
than actual memory in memory 22. Many other uses for 

60 virtual memory are well known. 

Referring back to the computer system 10 shown in FIG, 
1 in conjunction with the SMP node 12A implementation 
illustrated in FIG. 2, the physical address computed by 
MMUs 76 is a local physical address (LPA) defining a 

65 location within the memory 22 associated with the SMP 
node 12 in which the processor 16 is located. MTAG 68 
stores a coherency state for each "coherency unit** in 
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memory 22. When an address transaction is performed upon 
SMP bus 20, system interface logic block 62 examines the 
coherency state stored in MTAG 68 for the accessed coher- 
ency unit. If the coherency state indicates that the SMP node 
12 has sufficient access rights to the coherency unit to 5 
perform the access, then the address transaction proceeds. If, 
however, the coherency state indicates that coherency activ- 
ity should be performed prior to completion of the 
transaction, then system interface logic block 62 asserts the 
ignore signal 70. Logic block 62 performs coherency opera- jq 
tions upon network 14 to acquire the appropriate coherency 
state. When the appropriate coherency state is acquired, 
logic block 62 reissues the ignored transaction upon SMP 
bus 20. Subsequently, the transaction completes. 

Generally speaking, the coherency state maintained for a 15 
coherency unit at a particular storage location (e.g. a cache 
or a memory 22) indicates the access rights to the coherency 
unit at that SMP node 12. The access right indicates the 
validity of the coherency unit, as well as the read/write 
permission granted for the copy of the coherency unit within 20 
that SMP node 12. In one embodiment, the coherency states 
employed by computer system 10 are modified, owned, 
shared, and invalid. The modified state indicates that the 
SMP node 12 has updated the corresponding coherency unit. 
Therefore, other SMP nodes 12 do not have a copy of the 25 
coherency unit. Additionally, when the modified coherency 
unit is discarded by the SMP node 12, the coherency unit is 
stored back to the home node. The owned state indicates that 
the SMP node 12 is responsible for the coherency unit, but 
other SMP nodes 12 may have shared copies. Again, when 30 
the coherency unit is discarded by the SMP node 12, the 
coherency unit is stored back to the home node. The shared 
state indicates that the SMP node 12 may read the coherency 
unit but may not update the coherency unit without acquir- 
ing the owned state. Additionally, other SMP nodes 12 may 35 
have copies of the coherency unit as well. Finally, the invalid 
state indicates that the SMP node 12 does not have a copy 
of the coherency unit. In one embodiment, the modified state 
indicates write permission and any state but invalid indicates 
read permission to the corresponding coherency unit. 40 

As used herein, a coherency unit is a number of contigu- 
ous bytes of memory which are treated as a unit for 
coherency purposes. For example, if one byte within the 
coherency unit is updated, the entire coherency unit is 
considered to be updated. In one specific embodiment, the 45 
coherency unit is a cache line, comprising 64 contiguous 
bytes. It is understood, however, that a coherency unit may 
comprise any number of bytes. 

System interface 24 also includes a translation mechanism 
which utilizes translation storage 64 to store translations 50 
from the local physical address to a global address (GA). 
Certain bits within the global address identify the home node 
for the address, at which coherency information is stored for 
that global address. For example, an embodiment of com- 
puter system 10 may employ four SMP nodes 12 such as that 55 
of FIG. 1. In such an embodiment, two bits of the global 
address identify the home node. Preferably, bits from the 
most significant portion of the global address are used to 
identify the home node. The same bits are used in the local 
physical address to identify NUMA accesses. If the bits of 60 
the LPA indicate that the local node is not the home node, 
then the LPA is a global address and the transaction is 
performed in NUMA mode. Therefore, the operating system 
places global addresses in MMUs 76 for any NUMA-type 
pages. Conversely, the operating system places LPAs in 65 
MMU 76 for any COMA-type pages. It is noted that an LPA 
may equal a GA (for NUMA accesses as well as for global 
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addresses whose home is within the memory 22 in the node 
in which the LPA is presented). Alternatively, an LPA may 
be translated to a GA when the LPA identifies storage 
locations used for storing copies of data having a home in 
another SMP node 12. 

The directory 66 of a particular home node identifies 
which SMP nodes 12 have copies of data corresponding to 
a given global address assigned to the home node such that 
coherency between the copies may be maintained. 
Additionally, the directory 66 of the home node identifies the 
SMP node 12 which owns the coherency unit. Therefore, 
while local coherency between caches 18 and processors 16 
is maintained via snooping, system-wide (or global) coher- 
ency is maintained using MTAG 68 and directory 66. 
Directory 66 stores the coherency information correspond- 
ing to the coherency units which arc assigned to SMP node 
12A (i.e, for which SMP node 12 A is the home node). 

It is noted that for the embodiment of FIG. 2, directory 66 
and MTAG 68 store information for each coherency unit 
(i.e., on a coherency unit basis). Conversely, translation 
storage 64 stores local physical to global address translations 
defined for pages. A page includes multiple coherency units, 
and is typically several kilobytes or even megabytes in size. 

Software accordingly creates local physical address to 
global address translations on a page basis (thereby allocat- 
ing a local memory page for storing a copy of a remotely 
stored global page). Therefore, blocks of memory 22 are 
allocated to a particular global address on a page basis as 
well. However, as stated above, coherency states and coher- 
ency activities are performed upon a coherency unit. 
Therefore, when a page is allocated in memory to a particu- 
lar global address, the data corresponding to the page is not 
necessarily transferred to the allocated memory. Instead, as 
processors 16 access various coherency units within the 
page, those coherency units are transferred from the owner 
of the coherency unit. In this manner, the data actually 
accessed by SMP node 12 A is transferred into the corre- 
sponding memory 22. Data not accessed by SMP node 12 A 
may not be transferred, thereby reducing overall bandwidth 
usage upon network 14 in comparison to embodiments 
which transfer the page of data upon allocation of the page 
in memory 22. 

It is noted that in one embodiment, translation storage 64, 
directory 66, and/or MTAG 68 may be caches which store 
only a portion of the associated translation, directory, and 
MTAG information, respectively. The entirety of the 
translation, directory, and MTAG information is stored in 
tables within memory 22 or a dedicated memory storage (not 
shown). If required information for an access is not found in 
the corresponding cache, the tables are accessed by system 
interface 24. 

Turning now to FIG. 2A, an exemplary directory entry 71 
is shown. Directory entry 71 may be employed by one 
embodiment of directory 66 shown in FIG. 2. Other embodi- 
ments of directory 66 may employ dissimilar directory 
entries. Directory entry 71 includes a valid bit 73, a write 
back bit 75, an owner field 77, and a sharers field 79. 
Directory entry 71 resides within the table of directory 
entries, and is located within the table via the global address 
identifying the corresponding coherency unit. More 
particularly, the directory entry 71 associated with a coher- 
ency unit is stored within the table of directory entries at an 
ofiset formed from the global address which identifies the 
coherency unit. 

Valid bit 73 indicates, when set, that directory entry 71 is 
valid (i.e. that directory entry 71 is storing coherency 
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information for a corresponding coherency unit). When sidered non-coherent and therefore do not generate coher- 

clear, valid bit 73 indicates that directory entry 71 is invalid. ency activities. 

Owner field 77 identifies one of SMP nodes 12 as the SMP in queue 94 and SMP PIQ 96 receive transactions to 

owner of the coherency unit. The owning SMP node ^ queued from a transaction filter 98. TYansaction filter 98 

12A-12D maintains the coherency unit in either the modi- s is coupled to MTAO 68 and SMP address bus S8. If 

fied or owned states. Typically, the owning SMP node transaction filter 98 detects an I/O transaction upon address 

12A-12D acquires the coherency unit in the modified state S8 which identifies an I/O interface upon another SMP 

(see FIG. 13 below). Subsequently, the owning SMP node node 12, transaction filter 98 places the transaction m^^ 

12A-12D may then transition to the owned state upon ^'P f ^ « '^'f!'' 'JfT^^S*" "I 

providing a copy of the coherency unit to another SMP node 10 <i^<^^ '/T*'^An «>'«*I»°*ng 

12A-12D. The other SMP node 12A-12D acquires the /"^ ' " accordance 

coherency unit in the shared state. In one embodiment, y"'' coherency state, transaction filter 98 may assert 

owner field 77 comprises two bits encoded to identify one of 'gnoj* signal 70 and may queue a ccjerency transaction m 

four SMP nodes 12A-12D as the owner of the coherency ^^P in qijeue 94. Ignore s.gnal70,s asserted and a 

^ijjl ' jj coberency transaction queued if MTAG 68 indicates that 

„° ^ „ . insufScient access rights to the coherency unit for perfbrm- 

Sharers field 79 includes one bit aligned to each SMP i„g ^^h^rent transaction is maintained by SMP node 

node 12A-12D If an SMP node 12A-12D IS maintaining a jjA. Cbnversely, ignore signal 70 is deasserted and a 

shared copy of the coherency unit, the cotreqiondmg bit coherency transaction is not generated if MTAG 68 indicates 

withm sharers field 79 is set. Cbnversely, if the SMP node ^^iH a suflBcient access right is maintained by SMP node 

12A-12D IS not mamtaining a shared copy of the coherency ]^2A 

unit, the corresponding bit ^thin sharers field 79 is clear. In Transactions from SMP in queue 94 and SMP PIQ 96 art 

this manner, sharers field 79 indicates all of the shared processed by a request agent 100 within system interface 24. 

copiesofthecoherencyumtwhichex^^ p^^^ ^^^^^ ^^^^^^^ ^^^^^ LPA2GA translation 

system lU oi tiU. l. g2 translates the address of the transaction (if it is an 

Wnte back bit 75 mdicates, when set, that the SMP node lP^ address) from the local physical address presented upon 

12A-12D identified as the owner of the coherency unit via sMP address bus 58 into the corresponding global address, 

owner field 77 has written the updated copy of the coherency Request agent 100 then generates a header packet specifying 

unit to the home SMP node 12. When clear, bit 75 indicates ^ particular coherency request to be transmitted to the home 

that the owning SMP node 12A-12D has not wntten the node identified by the global address. The coherency request 

updated copy of the coherency unit to the home SMP node pi^ocd into output header queue 86. Subsequently, a 

12A-12D. coherency reply is received into input header queue 84. 

Turning now to FIG. 3, a block diagram of one embodi- Request agent 100 processes the coherency replies from 
ment of system interface 24 is shown. As shown in FIG. 3, input header queue 84, potentially generating reissue trans- 
system interface 24 includes directory 66, translation storage 35 actions for SMP out queue 92 (as described below). 
64, and MTAG 68. Translation storage 64 is shown as a Also included in system interface 24 is a home agent 1Q2 
global address to local physical address (GA2LPA) transla- and a slave agent 104. Home agent 102 processes coherency 
tion unit 80 and a local physical address to global address requests received from input header queue 84. From the 
{LPA2GA) translation unit 82. coherency information stored in directory 66 with respect to 

System interface 24 also includes input and output queues 40 a particular global address, home agent 102 determines if a 

for storing transactions to be performed upon SMP bus 20 or coherency demand is to be transmitted to one or more slave 

network 14. Specifically, for the embodiment shown, system agents in other SMP nodes 12. In one embodiment, home 

interface 24 includes input header queue 84 and output agent 102 blocks the coherency information corresponding 

header queue 86 for buffering header packets to and from to the aflfected coherency unit. In other words, subsequent 

network 14. Header packets identify an operation to be 45 requests involving the coherency unit are not performed 

performed, and specify the number and format of any data until the coherency activity corresponding to the coherency 

packets which may follow. Output header queue 86 buffers request is completed. According to one embodiment, home 

header packets to be transmitted upon network 14, and input agent 102 receives a coherency completion from the request 

header queue 84 buffers header packets received from agent which initiated the coherency request (via input header 

network 14 until system interface 24 processes the received 50 queue 84). The coherency completion indicates that the 

header packets. Similarly, data packets are buffered in input coherency activity has completed. Upon receipt of the 

data queue 88 and output data queue 90 until the data may coherency completion, home agent 102 removes the block 

be transferred upon SMP data bus 60 and network 14, upon the coherency information corresponding to the 

respectively. affected coherency unit. It is noted that, since the coherency 

SMP out queue 92, SMP in queue 94, and SMP I/O in 55 information is blocked until completion of the coherency 

queue (PIQ) 96 are used to buffer address transactions to and activity, home agent 102 may update the coherency infor- 

from address bus 58. SMP out queue 92 buffers transactions malion in accordance with the coherency activity performed 

to be presented by system interface 24 upon address bus 58. immediately when the coherency request is received. 

Reissue transactions queued in response to the completion of Slave agent 104 receives coherency demands from home 

coherency activity with respect to an ignored transaction are 60 agents of other SMP nodes 12 via input header queue 84. In 

buffered in SMP out queue 92. Additionally, transactions response to a particular coherency demand, slave agent 104 

generated in response to coherency activity received from may queue a coherency transaction in SMP out queue 92. In 

network 14 are buffered in SMP out queue 92. SMP in queue one embodiment, the coherency transaction may cause 

94 stores coherency related transactions to be serviced by caches 18 and caches internal to processors 16 to invalidate 

system interface 24. Conversely, SMP PIQ 96 stores I/O 65 the affected coherency unit. If the coherency unit is modified 

transactions to be conveyed to an I/O interface residing in in the caches, the modified data is transferred to system 

another SMP node 12. I/O transactions generally are con- interface 24. Alternatively, the coherency transaction may 
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cause caches 18 and caches internal to processors 16 to Home agent 102 may additionally transmit a coherency 

change the coherency state of the coherency unit to shared. reply to request agent 100 (reference number 114). The 

Once slave agent 104 has completed activity in response to coherency reply may also indicate the number of coherency 

a coherency demand, slave agent 104 transmits a coherency replies which are forthcoming from slave agents 104. 

reply to the request agent which initiated the coherency 5 Alternatively, certain transactions may b6 completed with, 

request corresponding to the coherency demand. The coher- interaction with slave agents 104. For example, an I/O 

ency reply is queued m output header queue 86. Prior to transaction targeting an I/O interface 26 in the SMP node 12 

performing activities m response to a coh^ containing home agent 102 may be completed by home 

global address received with the coherency demand is o^»«f in-^ u^r«« o.^lr.t ino « t,r„o.^*-^„ f« 

translated to a local ohvsical address via GA2LPA transla- ^^^^ ^^^"^ ^^^^^ ^ transaction for the 

tio^unit 80 ^AZLFA transla- associated SMP bus 20 (reference number 116), and then 

According to one embodiment, the coherency protocol ' ^^^^ ^"^^^'^^"^ transaction is complete, 

enforced by request agents 100, home agents 102, and slave ^ ^^^^^ ^ response to a coherency demand 

agents 104 includes a write invaUdate policy. In other words, home agent 102, may queue a transaction for presen- 

when a processor 16 within an SMP node 12 updates a tation upon the associated SMP bus 20 (reference number 

coherency unit, any copies of the coherency unit stored 118). Additionally, slave agents 104 transmit a coherency 

within other SMP nodes 12 are invalidated. However, other reply to request agent 100 (reference number 120). The 

write policies may be used in other embodiments. For coherency reply indicates that the coherency demand 

example, a write update policy may be employed. According received in response to a particular coherency request has 

to a write update policy, when an coherency unit is updated been completed by that slave. The coherency reply may 

the updated data is transmitted to each of the copies of the 20 further include the Reply Count value. The coherency reply 

coherency unit stored in each of the SMP nodes 12. is transmitted by slave agents 104 when the coherency 

Turning next to FIG. 4, a diagram depicting typical demand has been completed, or at such time prior to 
coherency activity performed between the request agent 100 completion of the coherency demand at which the coherency 
of a first SMP node 12A--12D (the "requesting node"), the demand is guaranteed to be completed upon the correspond- 
home agent 102 of a second SMP node 12A-12D (the 25 ing SMP node 12 and at which no state changes to the 
home node ), and the slave agent 104 of a third SMP node ^ff^ctcd coherency unit will be performed prior to comple- 
12A-12D (the slave node ) m response to a particular j^e coherency demand, 
transaction upon the SMP bus 20 within the SMP node 12 

corresponding to request agent 100 is shown. Specific coher- '^'i"^^^ ^^^"^ received a coherency reply 

ency activities employed according to one embodiment of 30 ^^^^ °^ affected slave agents 104 (e.g., when the 

computer system 10 as shown in FIG. 1 are further described received rephes equals the Reply Count value), 

below with respect to FIGS. 9-13. Reference numbers 100, ^^^^^^^ ^^^"^ transmits a coherency completion to home 

102, and 104 are used to identify request agents, home ^5^"* (reference number 122). Upon receipt of the 

agents, and slave agents throughout the remainder of this coherency compleUon. home agent 102 removes the block 

description. It is understood that, when an agent comrauni- 35 ^'^^^ corresponding coherency mformation. Request 

cates with another agent, the two agents often reside in ^^ent 100 may queue a reissue transaction for performance 

different SMP nodes 12A-12D ^P*^° complete the transaction within the 

Upon receipt of a transaction from SMP bus 20, request (reference number 124). 

agent 100 forms a coherency request appropriate for the °oted that each coherency request is assigned a 

transaction and transmits the coherency request to the home 40 unique tag by the request agent 100 which issues the 

node corresponding to the address of the transaction coherency request. Subsequent coherency demands, coher- 

(reference number UO). The coherency request indicates the ^°cy repUes, and coherency completions include the tag. In 

access right requested by request agent 100, as well as the manner, coherency activity regarding a particular coher- 

global address of the affected coherency unit. The access ®"cy request may be identified by each of the involved 

right requested is sufficient for allowing occurrence of the 45 agents. It is further noted that non-coherent operations may 

transaction being attempted in the SMP node 12 correspond- performed in response to non-coherent transactions (e.g. 

ing to request agent 100. ^/O transactions). Non-coherent operations may involve 

Upon receipt of the coherency request, home agent 102 requesting node and the home node. Still further, a 

accesses the associated directory 66 and determines which different unique tag may be assigned to each coherency 

SMP nodes 12 are storing copies of the affected coherency 50 ^^^^^^ ^^^^ *Sent 102. The different tag identifies 

unit. Additionally, home agent 102 determines the owner of ^P°^f ^S^nt 102, and is used for the coherency comple- 

the coherency unit. Home agent 102 may generate a coher- ^° °^ requester tag. 

ency demand to the slave agents 104 of each of the nodes Turning now to FIG. 5A, a diagram depicting coherency 

storing copies of the affected coherency unit, as well as to activity for an exemplary embodiment of computer system 

the slave agent 104 of the node which has the owned 55 1^ in response to a read to own transaction upon SMP bus 

coherency state for the affected coherency unit (reference 20 is shown. A read to own transaction is performed when 

number 112). The coherency demands indicate the new ^ cache miss is detected for a particular datum requested by 

coherency state for the affected coherency unit in the receiv- a processor 16 and the processor 16 requests write permis- 

ing SMP nodes 12, and may further include a "Reply Count" sion to the coherency unit. A store cache miss may generate 

value indicative of the number of replies to be received. 60 ^ read to own transaction, for example. 

While the coherency request is outstanding, home agent 102 A request agent 100, home agent 102, and several slave 

blocks the coherency information corresponding to the agents 104 are shown in FIG. 5A. The node receiving the 

affected coherency unit such that subsequent coherency read to own transaction from SMP bus 20 stores the affected 

requests involving the affected coherency unit are not initi- coherency unit in the invalid state (e.g. the coherency unit is 

ated by the home agent 102. Home agent 102 additionally 65 not stored in the node). The subscript "i" in request node 100 

updates the coherency information to reflect completion of indicates the invalid state. The home node stores the coher- 

the coherency request. ency unit in the shared state, and nodes corresponding to 
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several slave agents 104 store the coherency unit in the demands to all other slave agents 104 with a shared copy 

shared stale as well. The subscript "s" in home agent 102 and (reference number 133C). Each of these messages may also 

slave agents 104 is indicative of the shared state at those indicate the number of replies to be received, 

nodes. The- read to own operation causes transfer of the j^e owning slave agent 103 replies with data to the 

requested coherency ""'t to the requesUng node pe 5 requesting agent 100 (reference number 133) and invalidates 

r^uesung node receives the coherency unit m the modified j.^^^py ^^J^^ /j,^ R^p,y ^ount 

. * J. . i. value. All sharing slave agents 104 send invalidation 

UponreceiptofthereadtooMmt^^^^^ acknowledges to the requesfing agent (reference number 

20. reques agent 100 transmits a read to own coherency ^33^^ ^^^^^^^^ ^^^^ 

request to the home node of the coherency unit (reference jq sent with each of these messages as well After receiving all 

number 130). The home agent 102 in the receiving home , ? . T * 1 1 ! 

^j..*uu-j*Tp »u j acknowledges and the data, the request agent 100 sends a 

node detects the shared stale for one or more other nodes. . ^ , u 1 ♦ ♦u u « 

«. »u 1 u • ♦u u J . . . «!. coherency completion back to the home agent 102 

Since the slave ager are each in the shared state, not the / c i_ i^ir% n * • 1 

owned state, the home node may supply the reque;ted data (^<^f^r^ncc number 133F). Home agent 102 rcsponsively 

directly. Home agent 102 transmits a data coherency reply to ''""ZV^^^u • . • 

request agent 100, including the data corresponding to the ^^P' lUustrates a transaction wher^m request agent 

requested coherency unit (reference number 132). 100 has a shared a)py and sends a read-to-o^^^^^ 

Additionally, the data coherency reply the Reply Count home agent 102 (reference number 135A). When home 

value which indicates the total number of replies which are °/ ^^^^^^^^ read-to-own request, home agent 102 

to be received prior to request agent 100 taking ownership ^^^f^ ^^"^f transactions to this hne Home agent 102 

ofthe data. Home agent 102 updates directory 66 to indicate ^/j^f sends invahdation demands (reference number 

that the requesting SMP node 12A-12D is the owner ofthe 1^^) ^'^er nod^with a copy of the Ime (not to the 

coherency unit, and that each of the other SMP nodes requester however). These demands mclude the Reply 

12A^12D is invalid. When the coherency information C^t value. Home agent 102 further marks request agent 

regarding the coherency unit is unblocked upon receipt of a 25 owner. 

coherency completion from request agent 100, directory 66 slave agents (103 and 104) send invalidation acknowl- 

matches the state of the coherency unit at each SMP node 12. edges to request agent 100 (reference numbers 135C and 

Home agent 102 transmits invalidate coherency demands '^^^^^ invalidate their copies. These messages further 

to each ofthe slave agents 104 which arc maintaining shared ^^^^^^^ ^^P^y ^ount value. Fmally. request agent 100 

copies of the affected coherency unit (reference numbers 30 ^^"^^ ^ coherency completion back to the home agent 102 

134A, 134B, and 134C). Each coherency demand may receiving all acknowledges (reference number 135E). 

include the Reply Count value. The invalidate coherency This causes home agent 102 to remove the block from the 
demand causes the receiving slave agent to invalidate the 

corresponding coherency unit within the node, and to send F^G. 5D depicts coherency activity in response to a 

an acknowledge coherency reply to the requesting node 35 read-to-share request when a slave is the owner of the 

indicating completion of the invalidation. Each slave agent coherency unit. Similar to the above description, the coher- 

104 completes invalidation of the coherency unit and sub- ^ncy activity initiates when the request agent 100 sends a 

scquently transmits an acknowledge coherency reply read-to-share request to the home agent 102 (reference 

(reference numbers 136A, 136B, and 136C). In one number 137A). This causes home agent 102 to block new 

embodiment, each of the acknowledge replies includes the 40 transactions to this line. Home agent 102 marks the requester 

Reply Count value indicating the total number of replies to *s a sharer and sends an RTS demand to the owner slave 

be received by request agent 100 with respect to the coher- agent 103 (reference number 137B), The owning slave agent 

ency unit. 103 replies with data to the request agent 100 (reference 

Subsequent to receiving each of the acknowledge coher- number 137C) and stays in the owned state. Finally, request 

ency replies from slave agents 104 and the data coherency 45 ^S^"* ^^"^^ ^ coherency completion to the home agent 

reply from home agent 102, request agent 100 transmits a (reference number 137D), which causes the block of this line 

coherency completion to home agent 102 (reference number ^ ^ removed. 

138). Request agent 100 validates the coherency unit within It is noted that for read-to-share transaction requests, the 
its local memory, and home agent 102 releases the block reply count is one. For such transactions, the system maybe 
upon the corresponding coherency information. It is noted 50 implemented such that the Reply Count value is transmitted 
that data coherency reply 132 and acknowledge coherency from the home agent to slaves and forwarded to the request- 
replies 136 may be received in any order depending upon the ing agent in a manner similar to that described above for 
number of outstanding transactions within each node, read-to-own transactions. Alternatively, the Reply Count 
among other things. value may not be conveyed for these transactions. Instead, 
FIG. 5B is a diagram depicting coherency activity in 55 the request agent may be configured to send the coherency 
response to a read-to-own transaction request when a slave completion immediately upon receiving a single reply, 
agent 103 is the current owner of the coherency unit and It is further noted thai implementations are possible 
other slave agents 104 have shared copies of the coherency wherein the reply count is transmitted via only one coher- 
unit. The request agent 100 initiates the transaction by ency demand and one corresponding coherency reply. In the 
sending a read-lo-own request to home agent 102 (reference 60 above embodiment, since all demand and reply transactions 
number 133A). This causes home agent 102 to block new include the reply count, the implementation may be simpli- 
transactions to this line. Home agent 102 marks the requester fled since it is unknown which reply will first arrive at the 
as the sole owner of the line and sends an RTO demand to request agent. This allows for a symmetric design which also 
the owning slave agent 103 (reference number 133B). covers for cases wherein there is only a single data reply. 
Additionally, the read-to-own demand includes the Reply 65 Turning now to FIG. 6, a flowchart 140 depicting an 
Count value which indicates the number of replies to be exemplary state machine for use by request agents 100 is 
received. Home agent also sends invalidate coherency shown. Request agents 100 may include multiple indepen- 
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denl copies of the slate machine represented by flowchart Turning next to FIG. 7, a flowchart 160 depicting an 

140, such that multiple requests may be concurrently pro- exemplary state machine for home agent 102 is shown, 

cessed. Home agents 102 may include multiple independent copies 

Upon receipt of a transaction from SMP in queue 94, of the state machine represented by flowchart 160 in order 

request agent 100 enters a request ready state 142. In request 5 to allow for processing of multiple outstanding requests to 

ready state 142, request agent 100 transmits a coherency the home agent 102. However, the multiple outstanding 

request to the home agent 102 residing in the home node requests do not aflfect the same coherency unit, according to 

identified by the global address of the affected coherency one embodiment, 

unit. Upon transmission of the coherency request, request . ^n^^ • l 

agent 100 transitions to a request active sUle 144. During ,o ""-"^ »f 1»2 coherency requests in a receive 

request active sute 144. request agent 100 receives coher- '^f ^"'^ ""^^ "^^^'^^^ ^"^"^ * 

ency replies &om slave agents 104 (and optionally from ^"^'^1' '^"l"^' °^ wA'="°°, "^Tf/n 

home agent 102). When each of the coherency replfes has •^^n^'f °n '^^^^f -^ay mclude 1/0 read and I/O write 

been received, request agent 100 transitions to a new state '^''"'f ' '"'''""P' '"J"".**^' and administrative requests, 

depending upon the type of transaction which initiated the "f m kT embodiment. The non<oherent requests 

coherency activity. Additionally, request active state 142 " handled by transmitting a transaction upon SMP bus 20, 

may employ a timer for detecting that coherency replies f ^^te 164. A coherericy completion is subsequently 

have not be received within a predlflned timeout period. If «^''ns»i"ed. Upon receiving the coherency completion. I/O 

ti^^^ ^^^^ t« tu^ u^, - V write and accepted mtcrrupt transactions result m transmis- 

tne timer expires pnor to the receipt of the number of replies . c j ^ . -^^^^ i i i 

specified by home agent 102, then request agent 100 tran- sion of a data transaction upon SMP bus 20 m 

sitions to an error stale (not shown) Still further, certain O-^- data only state 165). When the data has be 

embodiments may employ a reply indicating that a read agent 102 transitions to idle state 166. AltemaUvely, 

transfer failed. If such a reply is received, request agent 100 ^dinmistrative. and rejected mterrupted transac- 

transitions to request ready state 142 to realtempt the read. ^""^ " ^ "^^^ ""^'^ "P°° °^ 

If I ,1 -.u ♦ * *u .1. coherency completion. 

Ii replies are received without error or time-out, then the is ^ 

state transitioned to by request agent 100 for read transac- Conversely, home agent 102 transitions to a check state 

lions is read complete state 146. It is noted that, for read ^P°° '"^^^P^ °^ « coherent request. Check state 168 is 

transactions, one of the received replies may include the data ^^^^^ ^ coherency activity is in progress for the 

corresponding to the requested coherency unit. Request coherency unit affected by the coherency request. If the 

agent 100 reissues the read transaction upon SMP bus 20 and 30 coherency activity is in progress (i.e. the coherency mfor- 

further transmits the coherency completion to home agent ^ blocked), then home agent 102 remains m check 

102. Subsequently, request agent 100 transitions to an idle ^^^^^ ^^^'^ m-progress coherency activity completes, 

slate 148. A new transaction may then be serviced by request "^^^ subsequently transitions to a set state 170. 

agent 100 using the state machine depicted in FIG. 6. During set state 170, home agent 102 sets the status of the 

Cbnversely, write active state 150 and ignored write 35 '^^''^^^ory entry storing the coherency information corre- 

reissue state 152 are used for write transactions. Ignore sponding to the affected coherency unit to blocked. The 

signal 70 is not asserted for certain write transactions in l^locked status prevents subsequent activity to the affected 

computer system 10, even when coherency activity is initi- coherency unit from proceeding, simplifying the coherency 

ated upon network 14. For example, I/O write transactions protocol of computer system 10. Depending upon the read or 

are not ignored. The write data is transferred to system 40 write nature of the transaction corresponding to the received 

interface. 24, and is stored therein. Write active state 150 is coherency request, home agent 102 transitions to read state 

employed for non-ignored write transactions, to allow for ^^^^ ""^P^y ^^^^^ 

transfer of data to system interface 24 if the coherency While in read state 172, home agent 102 issues coherency 

replies are received prior to the data phase of the write demands to slave agents 104 which are to be updated with 

transaction upon SMP bus 20. Once the corresponding data 45 respect to the read transaction. Home agent 102 remains in 

has been received, request agent 100 transitions to write ^^^^ state 172 until a coherency completion is received from 

complete stale 154. During write complete state 154, the request agent 100, after which home agent 102 transitions to 

coherency completion reply is transmitted to home agent clear block status state 176. In embodiments in which a 

102. Subsequently, request agent 100 transitions to idle state coherency request for a read may fail, home agent 102 

148. 50 restores the state of the affected directory entry to the state 

Ignored write transactions are handled via a transition to P"°' coherency request upon receipt of a coherency 

ignored write reissue state 152. During ignored write reissue completion indicating failure of the read transaction, 

state 152, request agent 100 reissues the ignored write During write state 174, home agent 102 transmits a 

transaction upon SMP bus 20. In this manner, the write data coherency reply to request agent 100. Home agent 102 

may be transferred from the originating processor 16 and the 55 remains in write reply state 174 until a coherency comple- 

corresponding write transaction released by processor 16. lion is received from request agent 100, If data is received 

Depending upon whether or not the write dala is to be with the coherency completion, home agent 102 transitions 

transmitted with the coherency completion, request agent ^ write data state 178. Alternatively, home agent 102 

100 transitions to either the ignored write active state 156 or transitions to clear block status state 176 upon receipt of a 

the ignored write complete state 158. Ignored write active 60 coherency completion not containing data, 

state 156, similar to write active state 150, is used to await Home agent 102 issues a write transaction upon SMP bus 

data transfer from SMP bus 20. During ignored write com- 20 during write dala state 178 in order to transfer the 

plete state 158, the coherency completion is transmitted to received write dala. For example, a write stream operation 

home agent 102. Subsequently, request agent 100 transitions (described below) results in a data transfer of data to home 

to idle state 148. From idle state 148, request agent 100 65 agent 102. Home agent 102 transmits the received data to 

transitions to request ready state 142 upon receipt of a memory 22 for storage. Subsequently, home agent 102 

transaction from SMP in queue 94. transitions to clear blocked status state 176. 
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Home agent 102 clears the blocked status of the coher- or the data is written to the memory 22 in the case of a write 
ency information corresponding to the coherency unit stream request. It is noted that read to share, read to own, and 
affected by the received coherency request in clear block read stream requests may be performed as COMA opera- 
status state 176. The coherency information may be subse- tions (e.g. RTS, RTO, and RS) or as NUMA operations (e.g. 
quenlly accessed. The stale found within the unblocked 5 RTSN, RTON, and RSN). 

coherency information reflects the coherency activity inili- A write back request is performed when a coherency unit 

ated by the previously received coherency request. After is to be written to the home node of the coherency unit. The 

clearing the block status of the corresponding coherency *^ome node replies with permission to write the coherency 

information, home agent 102 transitions to idle state 166. ^^^ck. The coherency unit is then passed to the home 

From idle state 166, home agent 102 transitions to receive lO coherency completion, 

request stale 162 upon receipt of a coherency request. invalidate request is performed to cause copies of a 

Turning now to HG. 8, a flowchart 180 is shown depict- ^^'"^"^ ^ invalidated. An 

^ , ♦ . u- r 1 . oi exemplary case m which the mvalidate request is generated 

ing an exemplary stale machine for slave agents 104. Slave , write stream transaction to a shared or owned coherency 

agent 104 receives coherency demands dunng a receive state ^nit. The write stream transaction updates the coherency 
182. In response to a coherency demand, slave agent 104 15 unit, and therefore copies of the coherency unit in other SMP 

may queue a transaction for presentation upon SMP bus 20. nodes are invalidated. 

The transaction causes a state change in caches 18 and yo read and write requests are transmitted in response to 
caches mtemal to processors 16 in accordance with the yo read and write transactions. I/O transactions are non- 
received coherency demand. Slave agpnt 104 queues the coherent (i.e. the transactions are not cached and coherency 
transaction during send request state 184. 20 maintained for the transactions). I/O block transac- 

During send reply state 186, slave agent 104 transmits a tions transfer a larger portion of data than normal I/O 

coherency reply to the request agent 100 which initiated the transactions. In one embodiment, sixty-four bytes of infor- 

transaction. It is noted that, according to various mation are transferred in a block I/O operation while eight 

embodiments, slave agent 104 may transition from send bytes are transferred in a non-block I/O transaction, 

request state 184 to send reply state 186 upon queuing the Flush requests cause copies of the coherency unit to be 

transaction for SMP bus 20 or upon successful completion invalidated. Modified copies are returned to the home node, 

of the transaction upon SMP bus 20. Subsequent to coher- Interrupt requests are used to signal interrupts to a particular 

ency reply transmittal, slave agent 104 transitions to an idle device in a remote SMP node. The interrupt may be pre- 

state 188. From idle state 188, slave agent 104 may transi- sented to a particular processor 16, which may execute an 

tion to receive state 182 upon receipt of a coherency interrupt service routine stored at a predefined address in 

demand. response to the interrupt. Administrative packets are used to 

Turning now to FIGS. 9-12, several tables are shown send certain types of reset signals between the nodes, 

listing exemplary coherency request types, coherency FIG. 10 is a table 198 listing exemplary coherency 

demand types, coherency reply types, and coherency demand types. Similar to tabic 190, columns 192, 194, and 

completion types. The types shown in the tables of FIGS. 196 are included in table 198. A read to share demand is 

9-12 may be employed by one embodiment of computer conveyed to the owner of a coherency unit, causing the 

system 10. Other embodiments may employ other sets of owner to transmit data to the requesting node. Similarly, read 

types. to own and read stream demands cause the owner of the 

FIG. 9 is a table 190 listing the types of coherency coherency unit to transmit data to the requesting node, 

requests. A first column 192 lists a code for each request Additionally, a read to own demand causes the owner to 

type, which is used in FIG. 13 below. A second column 194 change the state of the coherency unit in the owner node to 

lists the coherency requests types, and a third column 196 invalid. Read stream and read to share demands cause a state 

indicates the originator of the coherency request. Similar change to owned (from modified) in the owner node, 
columns are used in FIGS. 10-12 for coherency demands, 45 Invalidate demands do not cause the transfer of the 

coherency replies, and coherency completions. An "R" corresponding coherency unit. Instead, an invalidate 

indicates request agent 100; an "S" indicates slave agent demand causes copies of the coherency unit to be invali- 

104; and an "H" indicates home agent 102. dated. Finally, administrative demands are conveyed in 

A read to share request is performed when a coherency response to administrative requests. It is noted that each of 
unit is not present in a particular SMP node and the nature 50 demands are initiated by home agent 102, in response to 

of the transaction from SMP bus 20 to the coherency unit * request from request agent 100. 

indicates that read access to the coherency unit is desired. PIG. 11 is a table 200 listing exemplary reply types 

For example, a cachcable read transaction may result in a employed by one embodiment of computer system 10. 

read to share request. Generally speaking, a read to share Similar to FIGS. 9 and 10, FIG. 11 includes columns 192, 
request is a request for a copy of the coherency unit in the 55 15^4, and 196 for the coherency replies, 

shared state. Similarly, a read to own request is a request for A data reply is a reply including the requested data. The 

a copy of the coherency unit in the owned state. Cbpies of owner slave agent typically provides the data reply for 

the coherency unit in other SMP nodes should be changed to coherency requests. However, home agents may provide 

the invalid state. A read to own request may be performed in data for I/O read requests. 

response to a cache miss of a cacheable write transaction, for 50 The acknowledge reply indicates that a coherency 

example. demand associated with a particular coherency request is 

Read stream and write stream are requests to read or write completed. Slave agents typically provide acknowledge 

an entire coherency unit. These operations are typically used replies, but home agents provide acknowledge replies (along 

for block copy operations. Processors 16 and caches 18 do with data) when the home node is the owner of the coher- 
nol cache data provided in response to a read stream or write 65 ency unit, 

stream request. Instead, the coherency unit is provided as Slave not owned, address not mapped and error replies are 

data to the processor 16 in the case of a read stream request, conveyed by slave agent 104 when an error is detected. The 
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slave not owned reply is sent if a slave is identified by home as updated by home agent 102 in response to the received 
agent 102 as the owner of a coherency unit and the slave no coherency request. Additionally, home agent 102 may gen- 
longer owns the coherency unit. The address not mapped era tc a first coherency demand to the owner of the coherency 
reply is sent if the slave receives a demand for which no unit and additional coherency demands to any nodes main- 
device upon the corresponding SMP bus 20 claims owner- 5 twining shared copies of the coherency unit. The coherency 
ship. Other error conditions detected by the slave agent are demand transmitted to the owner is shown in column 222, 
indicated via the error reply. while the coherency demand transmitted to the sharing 

, , |. I 1.1 . 1 nodes is shown in column 224. Still further, home agent 102 

In addition to the error rephcs availab e to slave agent ^ ,„hereDcy reply to the requesting node. 

104, home agent 1(K may provide error replies. T^e negative Home agent replies are shown in column 226. 

acknowledge (NACK) and negative response (NOPE) are 10 ^ ^^^^ ^ .^^^ ^ 

used by home agent 102 to indicate that the corresponding ^^^^ ^^^^ , ^ 

requ^t is does not require seivice by home agent 102. The ^ 228. SUve agents 104 in nodes indicated 

NACK uansaction may be used to indicate that the corre- ^ ^^^^ ^ coherency demands shown 

sponding request is rejected by the tome node. For example. ^ ^j^^* 224 with Jhe coherency replies shown in column 

an inlermpt request receives a NACK if he in errup is 15 .^bsequent to performing state changes indicated by 

rejected by the receiving node. An acknowledge (ACK) is ^^^j^^ coherency demand, 

conveyed if the interrupt IS accepted by the receiving node. ,, ■ . e .J ■. u r i_ 

The NOPE transaction is used to indicate that a corr«pcnd- V.P°" '^^f °^ he appropnale number of coherency 

ing flush request was conveyed for a coherency unit which "ieent 100 transmits a coherency completion 

is not stored by the requesting node. 20 «» !»°i« ^gent 102. TTie coherency completions used for 

r-T^ ' . , , . . various transactions are shown m column 232. 

FIG 12 IS a table 202 depictmg exemplary coherency ^ ^^^^ j ^ 234 depicts the coherency activity 

completion types accord^^^^ one embodiment of comput^^ ^ ^^^^ 3^^^^^ transaction upon SMP bus 20 

system 10. Sim^^^^^ for which the corresponding MTAG state is invalid. The 

192, 194, and 196 for coherency completions. corresponding request agent 100 transmits a read to share 

Acompletion without data is used as a signal from request coherency request to the home node identified by the global 

agent 100 to home agent 102 that a particular request is address associated with the read to share transaction. For the 

complete. In response, home agent 102 unblocks the corre- case shown in row 234, the directory of the home node 

sponding coherency information. Two types of data comple- indicates that the requesting node is storing the data in the 

tions are included, corresponding to dissimilar transactions invalid state. The state in the directory of the home node for 

upon SMP bus 20. One type of reissue transaction involves the requesting node is updated to shared, and read to share 

only a data phase upon SMP bus 20. This reissue transaction coherency demand is transmitted by home agent 102 to the 

may be used for 1/0 write and interrupt transactions, in one node indicated by the directory to be the owner. No demands 

embodiment. The other type of reissue transaction involves arc transmitted to sharers, since the transaction seeks to 

both an address and data phase. Coherent writes, such as 35 acquire the shared state. The slave agent 104 in the owner 

write stream and write back, may employ the reissue trans- node transmits the data corresponding to the coherency unit 

action including both address and data phases. Finally, a to the requesting node. Upon receipt of the data, the request 

completion indicating failure is included for read requests agent 100 within the requesting node transmits a coherency 

which fail to acquire the requested state, completion to the home agent 102 within the home node. 

Turning next to FIG. 13, a. table 210 is shown depicting 40 The transaction is therefore complete, 

coherency activity in response to various transactions upon It is noted that the state shown in D column 218 may not 

SMP bus 20. Table 210 depicts transactions which result in match the state in MTAG column 214. For example, a row 

requests being transmitted to other SMP nodes 12. Trans- 236 shows a coherency unit in the invalid state in MTAG 

actions which complete within an SMP node are not shown. column 214. However, the corresponding state in D column 

A"-" in a column indicates that no activity is performed with 45 218 may be modified, owned, or shared. Such situations 

respect to that column in the case considered within a occur when a prior coherency request from the requesting 

particular row. A transaction column 212 is included indi- node for the coherency unit is outstanding within computer 

eating the transaction received upon SMP bus 20 by request system 10 when the access to MTAG 68 for the current 

agent 100. MTAG column 214 indicates the state of the transaction to the coherency unit is performed upon address 

MTAG for the coherency unit accessed by the address 50 bus 58. However, due to the blocking of directory entries 

corresponding to the transaction. The states shown include during a particular access, the outstanding request is com- 

the MOSI states described above, and an "n" state. The *'n" pleted prior to access of directory 66 by the current request, 

state indicates that the coherency unit is accessed in NUMA For this reason, the generated coherency demands are depen- 

mode for the SMP node in which the transaction is initiated. dent upon the directory state (which matches the MTAG 

Therefore, no local copy of the coherency unit is stored in 55 stale at the time the directory is accessed). For the example 

the requesting nodes memory. Instead, the coherency unit is shown in row 236, since the directory indicates that the 

transferred from the home SMP node (or an owner node) and coherency unit now resides in the requesting node, the read 

is transmitted to the requesting processor 16 or cache 18 to share request may be completed by simply reissuing the 

without storage in memory 22. read transaction upon SMP bus 20 in the requesting node. 

A request column 216 lists the coherency request trans- 60 Therefore, the home node acknowledges the request, includ- 

mitted to the home agent identified by the address of the ing a reply count of one, and the requesting node may 

transaction. Upon receipt of the coherency request listed in subsequently reissue the read transaction. It is further noted 

column 216, home agent 102 checks the state of the coher- that, although table 210 lists many types of transactions, 

ency unit for the requesting node as recorded in directory 66. additional transactions may be employed according to vari- 

D column 218 lists the current state of the coherency unit 65 ous embodiments of computer system 10. 

recorded for the requesting node, and D' column 220 lists the Although SMP nodes 12 have been described in the above 

state of the coherency unit recorded for the requesting node exemplary embodiments, generally speaking an embodi- 
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ment of computer system 10 may include one or more 
processing nodes. As used herein, a processing node 
includes at least one processor and a corresponding memory. 
Additionally, circuitry for commimicating with other pro- 
cessing nodes is included. When more than one processing 
node is included in an embodiment of computer system 10, 
the corresponding memories within the processing nodes 
form a distributed shared memory. A processing node may 
be referred to as remote or local. A processing node is a 
remote processing node with respect to a particular proces- 
sor if the processing node does not include the particular 
processor. Conversely, the processing node which includes 
the particular processor is that particular processor's local 
processing node. 

Numerous variations and modifications will become 
apparent to those skilled in the art once the above disclosure 
is fully appreciated. It is intended that the following claims 
be interpreted to embrace all such variations and modifica- 
tions. 

What is claimed is: 

1. A multiprocessing computer system including a plural- 
ity of processing nodes interconnected by a network, said 
multiprocessing computer system comprising: 

a request agent configured to generate a coherency 
request; 

a home agent coupled to receive said coherency request 
through said network and to generate a coherency 
demand in response to said coherency request; and 

a slave agent coupled to receive said coherency demand 
through said network and to generate a coherency reply 
in response to said coherency demand; 
■ wherein said request agent is configured to receive said 
coherency reply through said network from said slave 
agent and to generate a coherency completion in 
response to said coherency reply; 

wherein said home agent is coupled to receive said 35 
coherency completion; 

wherein said home agent is configured to generate a 
separate coherency demand for each node of said 
multiprocessing computer system which has a copy of 
a coherency unit corresponding to said coherency 40 
request; and 

wherein each said separate coherency demand includes a 
reply count indicative of a total number of replies to be 
received by said request agent. 

2. The muhiprooessing computer system as recited in 45 
claim 1 wherein said home agent includes a directory 
indicative of particular nodes of said multiprocessing com- 
puter system that contain copies of a given coherency unit. 

3. liie multiprocessing computer system as recited in 
claim 2 wherein said given coherency unit is specified by 50 
said coherency request. 

4. The multiprocessing computer system as recited in 
claim 1 wherein each said separate coherency demand 
indicates a new coherency state for said coherency unit. 

5. The multiprocessing computer system as recited in 
claim 1 wherein said request agent is configured to convey 
said coherency completion to said home agent after a 
coherency reply has been received by said request agent 
from each node having a copy of said coherency unit. 

6. The multiprocessing computer system as recited in 
claim 1 wherein said request agent is configured to assign a 
unique tag to said coherency request. 

7. The multiprocessing computer system as recited in 
claim 1 wherein said home agent is configured to block 
subsequent requests to a coherency unit corresponding to 
said coherency request in response to receiving said coher- 
ency request. 
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8. The multiprocessing computer system as recited in 
claim 7 wherein said home agent is configured to remove 
said block of subsequent requests to said coherency unit in 
response to a receipt of said coherency completion. 

9. The multiprocessing computer system as recited in 
claim 1 wherein said coherency request is a read-to-own 
request. 

10. The multiprocessing computer system as recited in 
claim 1 wherein said coherency request is a read-to-share 
request. 

11. A method for maintaining coherency in a multipro- 
cessing computer system including a plurality of processing 
nodes interconnected by a network, said method comprising: 

a request agent generating a coherency request; 
a home agent receiving said coherency request through 
said network; 

said home agent generating a coherency demand in 

response to said coherency request; 
a slave agent receiving said coherency demand through 

said network; 

said slave agent generating a coherency reply in response 
to said coherency demand; 

said request agent receiving said coherency reply through 
said network from said slave agent; 

said request agent generating a coherency completion in 
response to said coherency reply; 

said home agent receiving said coherency completion; 

wherein said home agent generates a separate coherency 
demand for each node of said multiprocessing com- 
puter system which has a copy of a coherency unit 
corresponding to said coherency request; 

wherein each said separate coherency demand includes a 
reply count indicative of a total number of replies to be 
received by said request agent. 

12. The method for maintaining coherency as recited in 
claim 11 wherein each of said separate coherency demands 
indicates a new coherency state for said coherency unit. 

13. The method for maintaining coherency as recited in 
claim 11 wherein said coherency request is a read -to-own 
request. 

14. The method for maintaining coherency as recited in 
claim 11 wherein said coherency request is a read-to-share 
request. 

15. A multiprocessing computer system including a plu- 
rality of processing nodes interconnected by a network, said 
multiprocessing computer system comprising: 

a request agent configured to generate a coherency 
request; 

a home agent coupled to receive said coherency request 
through said network; and 

a slave agent coupled to said network; 

wherein, if a valid copy of said slave agent contains a 
copy of a coherency unit corresponding to said coher- 
ency request, said home agent conveys a coherency 
demand in response to said coherency request to said 
slave agent, and wherein said slave agent conveys a 
coherency reply to said request agent in response to 
said coherency demand, and wherein said request agent 
conveys a coherency completion to said home agent in 
response to said coherency reply; and 

wherein, if said coherency unit resides solely within a 
local node corresponding to said home agent, said 
home agent conveys a coherency reply to said request 
agent in response to said coherency demand, and 
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wherein said request agent conveys a coherency 
completion to said home agent in response to said 
coherency reply; 
wherein said home agent is configured to generate sepa- 
rate coherency demands for each node of said multi- ^ 
processing computer system which has a copy of said 
coherency unit corresponding to said coherency 
request; 

wherein each of said separate coherency demands 
includes a reply count indicative of a total number of 
replies to be received by said request agent. 

16. The multiprocessing computer system as recited in 
claim 15 wherein said home agent includes a directory 
indicative of particular nodes of said multiprocessing com- 
puter system that contain copies of a given coherency unit. 

17. The multiprocessing computer system as recited in 
claim 16 wherein said given coherency unit is specified by 
said coherency request. 

18. The multiprocessing computer system as recited in 
claim 15 wherein each of said separate coherency demands 
indicates a new coherency state for said coherency imit. 

19. The multiprocessing computer system as recited in 
claim 15 wherein said request agent is configured to convey 
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said coherency completion to said home agent after a 
coherency reply has been received by said request agent 
from each node having a copy of said coherency unit. 

20. The muhiprocessing computer system as recited in 
claim 15 wherein said request agent is configured to assign 
a unique tag to said coherency request. 

21. The multiprocessing computer system as recited in 
claim 14 wherein said home agent is configured to block 
subsequent requests to said coherency unit corresponding to 
said coherency request in response to receiving said coher- 
ency request. 

22. The multiprocessing computer system as recited in 
claim 21 wherein said home agent is configured to remove 
said block of subsequent requests to said coherency unit in 
response to a receipt of said coherency completion. 

23. The multiprocessing computer system as recited in 
claim 15 wherein said coherency request is a read-to-own 
request. 

24. The multiprocessing computer system as recited in 
claim 17 wherein said coherency request is a read-to-share 
request. 

* ♦ ♦ * ♦ 
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